Fast Retrieval of Similar Subsequences in Long Sequence Databases

نویسندگان

Sanghyun Park

Dongwon Lee

Wesley W. Chu

چکیده

Although the Euclidean distance has been the most popular similarity measure in sequence databases, recent techniques prefer to use high-cost distance functions such as the time warping distance and the editing distance for wider applicability. However, if these distance functions are applied to the retrieval of similar subsequences, the number of subsequences to be inspected during the search is quadratic to the average length L of data sequences. In this paper, we propose a novel subsequence matching scheme, called the aligned subsequence matching, where the number of subsequences to be compared with a query sequence is reduced to linear to L. We also present an indexing technique to speed-up the aligned subsequence matching using the similarity measure of the modified time warping distance. The experiments on the synthetic data sequences demonstrate the effectiveness of our proposed approach; ours consistently outperformed the sequential scanning and achieved up to 6.5 times speed-up.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

Similarity search of time-warped subsequences via a suffix tree

This paper proposes an indexing technique for fast retrieval of similar subsequences using the time warping distance. The time warping distance is a more suitable similarity measure than the Euclidean distance in many applications where sequences may be of different lengths and/or different sampling rates. The proposed indexing technique employs a disk-based suffix tree as an index structure an...

متن کامل

Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

We introduce a new model of similarity of time sequences that captures the intuitive notion that two sequences should be considered similar if they have enough non-overlapping time-ordered pairs of subsequences thar are similar. The model allows the amplitude of one of the two sequences to be scaled by any suitable amount and its offset adjusted appropriately. Two subsequences are considered si...

متن کامل

Faster sequence homology searches by clustering subsequences

MOTIVATION Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. RESULTS We developed a fast homology search method based on database subsequence clusteri...

متن کامل

Similarity-Based Subsequence Search in Image Sequence Databases

This paper proposes an indexing technique for fast retrieval of similar image subsequences using the multi-dimensional time warping distance. The time warping distance is a more suitable similarity measure as compared to the Lp distance in many applications where sequences may be of different lengths and/or different sampling rates. Our indexing scheme employs a disk-based suffix tree as an ind...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Fast Retrieval of Similar Subsequences in Long Sequence Databases

نویسندگان

چکیده

منابع مشابه

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Similarity search of time-warped subsequences via a suffix tree

Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

Faster sequence homology searches by clustering subsequences

Similarity-Based Subsequence Search in Image Sequence Databases

عنوان ژورنال:

اشتراک گذاری